These files contain complete loan data for all loans issued through the 2007-2015, including the current loan status ('Current'
, 'Late'
, 'Fully Paid'
, etc.) and latest payment information. Additional features include credit scores, number of finance inquiries, address including zip codes, and state, and collections among others. The file is a matrix of about 890 thousand observations and 75 variables. Here, we use a previously transformed data set, which is however a full copy of the original one. For more information, or if you want to download these data, consult:
In [1]:
# Required Libraries
import os
import pandas as pd
import numpy as np
In [2]:
# Path Definitions of Required Data Sets
loan_df_path = os.path.join('/media/ML_HOME/ML-Data_Repository/data', 'loan_df')
us_states_GeoJSON = os.path.join('/media/ML_HOME/ML-Data_Repository/maps', 'us_states-albersUSA-Geo.json')
Here, we provide two choropleth maps concerning the Loan Book Value and the Loan Book Volume distribution across the U.S. States. To do so, we have used the "Bokeh"
Python library, a GeoJSON file which defines the U.S. States boundaries and it has been produced from a cartographic boundary shapefile that is provided from the official site of the U.S. Census Bureau, and the Pandas DataFrame grouped_agg_df
, where we aggregate the number, and the value of loans per U.S. State. "Bokeh"
is a Python library for interactive D3 visualizations!
In [3]:
# Load the Data Set of interest
loan_df = pd.read_pickle(loan_df_path)
In [4]:
# A fast look in the available data set..
loan_df.info(null_counts=True)
In [5]:
# Compute the "Loan Book Amount & Volume" per "US State"
grouped = loan_df.groupby(by=['addr_state'])
grouped_agg = (grouped[['loan_amnt']].agg(np.sum)
.rename(columns={'loan_amnt': 'loanbook_amnt_per_state'}))
grouped_agg['loanbook_vol_per_state'] = grouped['loan_amnt'].agg(np.count_nonzero)
grouped_agg_df = grouped_agg.reset_index()
grouped_agg_df.head()
Out[5]:
In [6]:
# Prepare the "grouped_agg_df" Data Frame as a JSON file...
# This JSON file has been appropriately joined into the GeoJSON Data Source, "us_states_GeoJSON", that we use here.
grouped_agg_df[:5].to_json(orient='records')
Out[6]:
In [7]:
# Load the necessary libraries for the D3 Visualization
from bokeh.io import show, output_notebook
from bokeh.palettes import (
YlOrRd9 as palette1,
YlGnBu9 as palette2)
from bokeh.plotting import figure
from bokeh.models import (
GeoJSONDataSource,
LogColorMapper,
HoverTool,
LogTicker,
ColorBar)
# Load the enriched GeoJSON Data Source, with the loanbook measures of interest
with open(us_states_GeoJSON, 'r') as f:
geo_source = GeoJSONDataSource(geojson=f.read())
# Output the Choropleth Plots in Notebook
output_notebook()
# PROVIDE THE CHOROPLETH OF "LOAN BOOK AMOUNT PER STATE"
palette1.reverse()
color_mapper = LogColorMapper(palette=palette1,
low=grouped_agg_df['loanbook_amnt_per_state'].min(),
high=grouped_agg_df['loanbook_amnt_per_state'].max())
# Define the figure "Tools" we want to make available
TOOLS = "pan, wheel_zoom, reset, hover, save"
# Plot the figure
# Define the figure dimensions and its general details
p = figure(title="Loan Book Value by U.S. States", tools=TOOLS,
plot_width=960, plot_height=500,
x_range=(0, 960), y_range=(500, 0),
x_axis_location=None, y_axis_location=None)
# Render the "Bokeh" patches in Glyph
p.patches('xs', 'ys', source=geo_source,
fill_color={'field': "loanbook_amnt_per_state" ,'transform': color_mapper},
fill_alpha=0.7, line_color="white", line_width=0.5)
# Add a Hover Tools over the U.S. States
hover = p.select_one(HoverTool)
hover.point_policy = "follow_mouse"
hover.tooltips = [
("State", "@state"),
("Loan Book Amount", "@loanbook_amnt_per_state{,.2f} USD"),
("(Long, Lat)", "($x, $y)"),
]
# Add a ColorBar Legend
color_bar = ColorBar(color_mapper=color_mapper, ticker=LogTicker(),
background_fill_alpha=0.7,
label_standoff=5,
major_label_text_color='black',
major_tick_line_color='black', major_tick_line_width=1.3, major_tick_out=5,
border_line_color=None, location=(0,0),
orientation='horizontal', width=500)
p.add_layout(color_bar, 'above')
show(p)
In [8]:
# PROVIDE THE CHOROPLETH OF "LOAN BOOK VOLUME PER STATE"
palette2.reverse()
color_mapper = LogColorMapper(palette=palette2,
low=grouped_agg_df['loanbook_vol_per_state'].min(),
high=grouped_agg_df['loanbook_vol_per_state'].max())
# Define the figure "Tools" we want to make available
TOOLS = "pan, wheel_zoom, reset, hover, save"
# Plot the figure
# Define the figure dimensions and its general details
p = figure(title="Loan Book Volume by U.S. States", tools=TOOLS,
plot_width=960, plot_height=500,
x_range=(0, 960), y_range=(500, 0),
x_axis_location=None, y_axis_location=None)
# Render the "Bokeh" patches in Glyph
p.patches('xs', 'ys', source=geo_source,
fill_color={'field': "loanbook_vol_per_state" ,'transform': color_mapper},
fill_alpha=0.7, line_color="white", line_width=0.5)
# Add a Hover Tools over the U.S. States
hover = p.select_one(HoverTool)
hover.point_policy = "follow_mouse"
hover.tooltips = [
("State", "@state"),
("Loan Book Volume", "@loanbook_vol_per_state{,}"),
("(Long, Lat)", "($x, $y)"),
]
# Add a ColorBar Legend
color_bar = ColorBar(color_mapper=color_mapper, ticker=LogTicker(),
background_fill_alpha=0.7,
label_standoff=5,
major_label_text_color='black',
major_tick_line_color='black', major_tick_line_width=1.3, major_tick_out=5,
border_line_color=None, location=(0,0),
orientation='horizontal', width=500)
p.add_layout(color_bar, 'above')
show(p)
In [ ]: